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ABSTRACT 



Context. Higher resolution telescopes as well as 3D numerical simulations will require the development of detailed 3D radiative 

transfer calculations. Building upon our previous work we extend our method to include both continuum and line transfer. 

Aims. We present a general method to calculate radiative transfer including scattering in the continuum as well as in lines in 3D static 

atmospheres. 

Methods. The scattering problem for line transfer is solved via means of an operator splitting (OS) technique. The formal solution is 
based on a long-characteristics method. The approximate A operator is constructed considering nearest neighbors exactly. The code 
is parallelized over both wavelength and solid angle using the MPI library. 

Results. We present the results of several test cases with different values of the thermalization parameter and two choices for the 
temperature structure. The results are directly compared to ID spherical tests. With our current grid setup the interior resolution is 
much lower in 3D than in ID, nevertheless the 3D results agree very well with the well-tested ID calculations. We show that with 
relatively simple parallelization that the code scales to very large number of processors which is mandatory for practical applications. 
Conclusions. Advances in modern computers will make realistic 3D radiative transfer calculations possible in the near future. Our 
current code scales to very large numbers of processors, but requires larger memory per processor at high spatial resolution. 
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1. Introduction 

Hydrodynamical calculations in two and three spatial dimen- 
sions are necessary in a broad range of astrophysical con- 
texts. With modern parallel supercomputers, they are also be- 
coming more realistic, in that they can be run at modest 
to high resolution. Performing full radiation hydrodynamical 
calculatio ns is presently still too co mputationally expensive. 
Recently, Hubenv & Burrows! ( 2006b have presented a mixed 
frame method of solving the time-dependent radiative trans- 
fer problem in 2D, but their work is tailored toward neu- 
trino transport where the absence of rapidly changing opacity 
such as a spectral line makes their app roximations appropriate . 
Similar work has b een presented by iMihalas & Kleinl dl982f) . 
Lowrie et alj dl999h. andlLowrie & Morell d200ll). Taking a dif- 
ferent approach Krumholz et alJ d2006h derive the flux-limiter 
to 0(v/c) for use in radiation hydrodynamics calculations . 
Similar work was presented by ICooperstein & Baronl dl992l) . 
Even though these recent first steps are improvements, they suf- 
fer from a loss of accuracy, either in dealing with spectral lines, 
or in obtaining the correct angular dependence of the photon dis- 
tribution function or the specific intensity. While these recent 
works are expedient, they are crude enough that the results of the 
hydrodynamical calculations cannot be compared directly with 
observed spectra. Given the fact that computational resources 
are finite, a final post-processing step is necessary to compare 
the results of hydrodynamical calculations to observations. 



fer equation for scattering continua in 3D (when we say 3D 
we mean three spatial dimensions, plus three momentum di- 
mensions) for the time independent, static case. Here we ex- 
tend our method to include transfer in lines including the case 



that th e line is scattering dominated. iFabiani Bendicho et al 



dl997l) presented a multi-level, multi-grid, multi-dimensional 
radiative transfer scheme, using a lower triangular ALO and 
solving the scattering problem v ia a Gauss-Seidel method, 
van Noort, Hub env. & Lanz (2002) presented a method of solv- 
ing the full NLTE radiative transfer problem using the short 
characteristics method in 2-D for Cartesian, spherical, and 
cylindrical geometry. They also used the technique of acceler- 
ated l ambda iteration ( ALI) ( Olson et alii 1987 ; Olson & Kunasz 
Il987l) . however they restricted themselves to the case of a diag- 
onal accelerated lambda operator (ALO). 

We describe our method, its rate of convergence, and present 
comparisons to our well-tested 1-D calculations. 

2. Method 



In lHauschildt & Baronl d200q, hereafter: Paper I) we de- 
scribed a framework for the solution of the radiative trans- 



In the following discussion we use notation of lHauschi ldt ( 1992) 
and Paper I. The basic framework and the methods used for the 
formal solution and the solution of the scattering problem via 
operator splitting are discussed in detail in paper I and will thus 
not be repeated here. We have extended the framework to solve 
line transfer problems with a background contin uum. The basic 
approach is similar to that of IHauschildti dl993l) . In the simple 
case of a 2-level atom with background continuum we consider 
here as a test case, we use a wavelength grid that covers the pro- 
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file of the line including the surrounding continuum. We then 
use the wavelength dependent mean intensities J A and approxi- 
mate A operators A* to compute the profile integrated line mean 
intensities J and A* via 

J = J (f>(A)J A dA 
and 

A* = j (f>(A)A* dX. 

J and A* are then used to compute an updated value for J and 
the line source function 

S = (l-e)J+eB 

where e is the line thermalization parameter (0 for a purely ab- 
sorptive line, 1 for a purely scattering line). B is the Planck func- 
tion, B \, profile averaged over the line 

B = j 4>(A)B A dA 

via the standard iteration method 

[1 - A*(l - €)] /new = As - A*(l - e)J oii , 

where Jf s = A5 id- This equation is solved directly to get the 
new values of J which is then used to compute the new source 
function for the next iteration cycle. 

We construct the line A* directly from the wavelength de- 
pendent A*'s generated by the solution of the continuum trans- 
fer problems. For practical reasons, we use in this paper only 
the nearest neighbor A* discussed in paper I. Larger A*s require 
significantly more storage and small test cases indicate that they 
do not decrease the number of iterations enough to warrant their 
use as long as they are not much larger than the nearest neighbor 
A*. 

3. Application examples 

We use the framework discussed in paper I as the baseline for 
the line transfer problems discussed in this paper. In addition to 
the highly efficient parallelization of solid angle space, we have 
implemented a parallelization over wavelength space using the 
MPI distributed memory model. For static configurations (or for 
configurations with velocity fields treated in the Eulerian frame) 
there is no direct coupling between different wavelength points. 

Our basic setup is similar to that discussed in paper I. We 
use a sphere with a grey continuum opacity parameterized by a 
power law in the continuum optical depth T s a- The basic model 
parameters are 

1. Inner radius r c = 10 13 cm, outer radius r out = 1.01xl0 15 cm. 

2. Minimum optical depth in the continuum r™" = 10~ 4 and 
maximum optical depth in the continuum t™ x = 1 . 

3. constant temperature structure with T = 10 4 K or 

4. grey temperature structure with T e ff = 10 4 K. 

5. Outer boundary condition Zr = and diffusion inner bound- 
ary condition for all wavelengths. 

6. Continuum extinction^. = C/r 2 , with the constant C fixed 
by the radius and optical depth grids. 



7. Parameterized coherent & isotropic continuum scattering by 
defining 

Xc = e c K c + (1 - e c )o- c 

with < e c < 1. k c and <x c are the continuum absorption and 
scattering coefficients. 

The line of the simple 2-level model atom is parameterized 
by the ratio of the profile averaged line opacity \i to the con- 
tinuum opacity Xc and the line thermalization parameter e\. For 
the test cases presented below, we have used e c = 1 and a con- 
stant temperature and thus a constant thermal part of the source 
function for simplicity (and to save computing time) and set 
XilXc = 10 6 to simulate a strong line, with varying e/ (see be- 
low). With this setup, the optical depths as seen in the line range 
from 10~ 2 to 10 6 . We use 32 wavelength points to model the full 
line profile, including wavelengths outside the line for the con- 
tinuum. We did not require the line to thermalize at the center of 
the test configurations, this is a typical situation one encounters 
in a full 3D configurations as the location (or even existence) of 
the thermalization depths becomes more ambiguous than in the 
ID case. 

The sphere is put at the center of the Cartesian grid, which is 
in each axis 10% larger than the radius of the sphere. For the test 
calculations we use voxel grids with the same number of spatial 
points in each direction (see below). The solid angle space was 
discretized in (0, <p) with ng = if not stated otherwise. In the 
following we discuss the results of various tests. In all tests we 
use the LC method for the 3D RT solution. Unless otherwise 
stated, the tests were run on parallel computers using 128 CPUs. 
For the 3D solver we use n x ■ — n y — n, = 2 * 64 + 1 or n x — n y = 
n z = 2 * 96 + 1 points along each axis, for a total of 129 3 or 193 3 
spatial points, depending on the test case. The solid angle space 
discretization uses ng =n$ = 64 points. 

3.1. LTE tests 

In this test we have set e\ — 1 to test the accuracy of the for- 
mal solution by comparing to the results of the ID code. The ID 
solver uses 64 radial points, distributed logarithmically in optical 
depth. In Fig.|2]we show the line mean intensities J as function 
of distance from the center for both the ID (+ symbols) and the 
3D solver. The results plotted in Fig.Q]show an excellent agree- 
ment between the two solutions, showing that the line 3D RT for- 
mal solution is comparable in accuracy with the corresponding 
ID formal solution. Note that the difference of the distribution 
of spatial points (linear in the 3D case, approximately logarith- 
mic in the ID case) causes a much lower resolution of the 3D 
calculations in the central regions and in a higher resolution of 
the 3D calculations in the outer regions compared to the ID test 
case. 

3.2. Tests with line scattering 

We have run a number of test calculation similar to the LTE case 
but with line scattering included. In Fig.|3]we show the results for 
e/ = 10~ 4 and in Fig. [4] we show the results for e\ = 1CT 8 as ex- 
amples. In both cases, the dynamical ranges of / are much larger 
than in the LTE case. The 3D calculations compare very well to 
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the ID calculations, in particular in the outer zones. In the inner 
parts the resolution of the 3D models is substantially lower than 
for the ID models, therefore the differences are largest there. 
In Figs. [3] and [4] we show the results for a test model with a 
grey temperature structure. In these models, the 3D spatial grid 
was substantially larger (n x - n y = n z = 2 * 96 + 1) in order 
to help resolve the inner regions where the temperature gradi- 
ents are very large. As expected, the agreement is not as good 
in the inner regions as in the models with a constant tempera- 
ture, however, the models agree very well in the outer regions. 
Overall, the agreement is very similar in quality compared to the 
case with constant temperatures. The differences in the mean in- 
tensity J between the ID comparison case and the 3D case is 
several per-cent in the innermost layers where the grid of the 
3D case is under-sampled, the differences are below 0. 1 % in the 
outer zones. 



Table 1. Scaling results for different parallel configurations, 
^worker is the number of MPI processes working on a for- 
mal solution for a single wavelength, Miuster is the number of 
^worker sets of processes working on different wavelength, the 
total number of MPI processes is Miuster X Mvorker- The column 
'FS+A*+OS step' gives teh wallclock time (in seconds) for the 
calculation of the first formal solution plus the construction of 
the A* operator plus the time for the first operator splitting step, 
the column 'FS+OS step' is the time for the second (and subse- 
quent) formal solution and operator splitting step. 
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FS+A*+OS step 
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3.3. Convergence 

The convergence properties of the line transfer tests presented 
here are shown in Figs. ITifTOl In each figure, we show the con- 
vergence rates, as measured by the relative corrections per iter- 
ation, for a number of test runs. In all tests show here we have 
used n A = n y = n, = 2*32+1 points and ng — — 32 solid angle 
points for the 3D test case and 64 radial points for the ID com- 
parison test. The iterations were started with Si = J = B at all 
spatial points, this initial guess causes a relative error of about 10 
in J at the outer zones for the case with e; = 10~ 2 and about 10 4 
in J at the outer zones for the case with e; = 1CT 8 . The plots show 
that the A iteration is useless even for the relatively benign case 
of e/ = 10~ 2 . The operator splitting method delivers much larger 
corrections and is substantially accelerated by the Ng method, 
similar to the results shown in Paper I. The nearest-neighbor op- 
erator gives substantially better convergence rates than the diag- 
onal operator, cf. Fig [7] for the test cases with with e/ < 10~ 2 
the convergence behavior of the diagonal operator is unstable, 
the corrections tend to show oscillations. The nearest-neighbor 
operator shows stable convergence with quickly declining cor- 
rections for all test cases, its convergence rate can be accelerated 
with Ng's method. The total number of iterations required for 
the nearest-neighbor operator is essentially identical to the ID 
case with a tri-diagonal operator. 

3.4. Parallelization 

We have implemented a hierarchical parallelization scheme on 
distributed memory machines usi ng the MPI framework sim - 



ilar to the scheme discussed in Bar on & Hauschildtl 0998). 
Basically, the most efficient parallelization opportunities in the 
problem are the solid angle and wavelength sub-spaces. The to- 
tal number of solid angles in the test models is at least 4096, 
the number of wavelength points is 32 in the tests presented 
here but will be much larger in large scale applications. Thus 
even in the simple tests presented here, the calculations could 
theoretically be run on 131072 processors. The work required 
for each solid angle is roughly constant (the number of points 
that need to be calculated depends on the angle points) and dur- 
ing the formal solution process the solid angles are independent 
from each other. The mean intensities (and other solid angle in- 



tegrated quantities) are only needed after the formal solution for 
all solid angles is complete, so a single collective MPI opera- 
tion is needed to finish the computation of the mean intensities 
at each wavelength. Similarly, the wavelength integrated mean 
intensities J are needed only after the formal solutions are com- 
pleted for all wavelengths (and solid angles). Therefore, the dif- 
ferent wavelength points can be computed in parallel also with 
the only communication occurring as collective MPI operations 
after all wavelength points have been computed. We thus divide 
the total number of processes up in a number of 'wavelength 
clusters' (each working on a different set of wavelength points) 
that each have a number of 'worker' processes which work on 
a different set of solid angle points for any wavelength. In the 
simplest case, each wavelength cluster has the same number of 
worker processes so that Mot = Master x Marker where Mot is the 
total number of MPI processes, Miuster is the number of wave- 
length clusters and A^ wm k e r is the number of worker processes 
for each wavelength cluster. For our tests we could use a maxi- 
mum number of 128 CPUs on the HLRN IBM Regatta (Power4 
CPUs) system. In Table [1] we show the results for the 3 com- 
binations that we could run (due to computer time limitations) 
for a ei = 10~ 4 line transfer test case with 32 wavelength points, 
n x = n y = n z = 2 * 64 + 1 spatial points and ng — — 64 
solid angle points. For example, the third row in the table is for a 
configuration with 4 wavelength clusters, each of them using 32 
CPUs working in parallel on different solid angles, for a total of 
128 CPUs. The 3rd and 4th columns give the time (in seconds) 
for a full formal solution, the construction of the A* operator 
and an OS step (the first iteration) and the time for a formal so- 
lution and an OS step (the second iteration), respectively. As the 
A* has to be constructed only in the first iteration, the overall 
time per iteration drops in subsequent iterations. Similarly to the 
ID case, the construction of the A* is roughly equivalent to one 
formal solution. We have verified that all parallel configurations 
lead to identical results. The table shows that configurations with 
more clusters are slightly more efficient, mostly due to better 
load balancing. However, the differences are not really signifi- 
cant in practical applications so that the exact choice of the setup 
is not important. This also means that the code can easily scale 
to much larger numbers of processors since realistic applications 
will require much more than the 32 wavelength points used in the 
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test calculations. Note that the MPI parallelization can be com- 
bined with shared memory parallelization (e.g., using openMP) 
in order to more efficiently utilize modern multi-core processors 
with shared caches. Although this is implemented in the current 
version of the 3D code, we do not have access to a machine with 
such an architecture and it was not efficient to use openMP on 
multiple single core processors. 

4. Conclusions 

Using rather difficult test problems, we have shown that our 3D 
long-characteristics method gives very good results when com- 
pared to our well-tested ID code. The main differences are due 
to poorer spatial resolution close to the center of the grid in the 
3D case. The code has also been parallelized and scales to very 
large numbers of processors. Future work will examine a short- 
characteristics method of solution which should enable higher 
resolution grids with the same memory requirements and an ex- 
tension to full multi-level NLTE modeling. 
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Fig. 1. Comparison of the results obtained for the LTE line test 
with the ID solver (x symbols) and the 3D line solver. This fig- 
ure shows cuts along the x, y, and z axes of the 3D grid for a 
grid with n x = n y — n z — 2 * 64 + 1 spatial points. The ordinate 
axis shows the coordinates, the y axis the log of the mean inten- 
sity averaged over the line profiles (J) for cuts along the axes of 
the 3D grid. For the ID comparison case the ordinate shows ± 
distance from the center. 

Fig. 2. Comparison of the results obtained for the LTE line test 
with the ID solver (+ symbols) and the 3D line solver. The x 
axis shows the distances from the center of the sphere, the y axis 
the log of the mean intensity averaged over the line profiles (/). 
The 3D model uses n x = n y = n z = 2 * 64 + 1 spatial points. 
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Fig. 3. Comparison of the results obtained for the e\ = 10~ 4 
line test (constant T) with the ID solver (x symbols) and the 
3D line solver. This figure shows cuts along the x, y, and z axes 
of the 3D grid with n x = n y = n z = 2 * 64 + 1 spatial points. 
The ordinate axis shows the coordinates, the y axis the log of the 
mean intensity averaged over the line profiles (J) for cuts along 
the axes of the 3D grid. For the ID comparison case the ordinate 
shows + distance from the center. 



Fig. 4. Comparison of the results obtained for the e/ = 10~ 8 
line test (constant T) with the ID solver (x symbols) and the 
3D line solver. This figure shows cuts along the x, y, and z axes 
of the 3D grid with n x — n y — n z — 2 * 64 + 1 spatial points. 
The ordinate axis shows the coordinates, the y axis the log of the 
mean intensity averaged over the line profiles (J) for cuts along 
the axes of the 3D grid. For the ID comparison case the ordinate 
shows ± distance from the center. 

inc transfer, cpsilon = 1 c — A , grey T 




Fig. 5. Comparison of the results obtained for the e/ = 10 line 
test (grey T) with the ID solver (x symbols) and the 3D line 
solver. This figure shows cuts along the x, y, and z axes of the 
3D grid with n x = n y — «- = 2 * 96 + 1 spatial points. The 
ordinate axis shows the coordinates, the y axis the log of the 
mean intensity averaged over the line profiles (J) for cuts along 
the axes of the 3D grid. For the ID comparison case the ordinate 
shows ± distance from the center. 




Fig. 7. Convergence of the iterations for the line transfer case with e/ = 10 2 . The maximum relative corrections (taken over all 
spatial points) are plotted vs. iteration number. 




Fig. 8. Convergence of the iterations for the line transfer case with e/ = 10 . The maximum relative corrections (taken over all 
spatial points) are plotted vs. iteration number. 




Fig. 9. Convergence of the iterations for the line transfer case with e/ = 10 . The maximum relative corrections (taken over all 
spatial points) are plotted vs. iteration number. 
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Fig. 10. Convergence of the iterations for the line transfer case with different e/ as indicated in the legend. The maximum relative 
corrections (taken over all spatial points) are plotted vs. iteration number. The symbols without connecting lines are the convergence 
rates obtained without using Ng acceleration. 
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